AITopics | query position

Collaborating Authors

query position

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues

Li, Haoyang, Xu, Zhanchao, Li, Yiming, Chen, Xuejia, Li, Darian, Tian, Anxin, Xiao, Qingfa, Deng, Cheng, Wang, Jun, Li, Qing, Chen, Lei, Yuan, Mingxuan

arXiv.org Artificial IntelligenceSep-29-2025

Multi-turn dialogues are essential in many real-world applications of large language models, such as chatbots and virtual assistants. As conversation histories become longer, existing large language models face increasing computational and memory challenges, which hinder their ability to provide efficient and responsive interactions. Most current acceleration methods either compress the context or optimize key value caching, but they often rely on fixed or position-based heuristics that do not adapt well to the dynamic and unpredictable patterns found in actual multi-turn conversations. As a result, these models cannot accurately identify and prioritize the most relevant context, leading to degraded response quality. In this paper, we present LoopServe, an adaptive dual-phase inference acceleration framework for large language models in multi-turn dialogues. LoopServe introduces two main innovations. First, it performs online sparsification during the prefilling phase by dynamically selecting the most important parts of the attention matrix for each new input. Second, it uses progressive key value compression during decoding by adaptively maintaining a relevant and efficient cache based on the most recently generated output tokens. We also propose a new benchmark with eleven multi-turn datasets that reflect realistic query positions and conversational dependencies. Extensive experiments demonstrate that LoopServe consistently achieves superior effectiveness compared to existing baselines and significantly accelerates LLM inference across a wide range of long-context dialogue tasks.

benchmark, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.13681

Country:

Europe > Austria > Vienna (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection

Fan, Li, Wang, Peng, Yang, Jing, Shen, Cong

arXiv.org Machine LearningJun-27-2025

Transformers have shown potential in solving wireless communication problems, particularly via in-context learning (ICL), where models adapt to new tasks through prompts without requiring model updates. However, prior ICL-based Transformer models rely on deep architectures with many layers to achieve satisfactory performance, resulting in substantial storage and computational costs. In this work, we propose CHain Of thOught Symbol dEtection (CHOOSE), a CoT-enhanced shallow Transformer framework for wireless symbol detection. By introducing autoregressive latent reasoning steps within the hidden space, CHOOSE significantly improves the reasoning capacity of shallow models (1-2 layers) without increasing model depth. This design enables lightweight Transformers to achieve detection performance comparable to much deeper models, making them well-suited for deployment on resource-constrained mobile devices. Experimental results demonstrate that our approach outperforms conventional shallow Transformers and achieves performance comparable to that of deep Transformers, while maintaining storage and computational efficiency. This represents a promising direction for implementing Transformer-based algorithms in wireless receivers with limited computational resources.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2506.21093

Country:

North America > United States > Virginia (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.91)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis

Hong, Guan Zhe, Dikkala, Nishanth, Luo, Enming, Rashtchian, Cyrus, Wang, Xin, Panigrahy, Rina

arXiv.org Artificial IntelligenceDec-9-2024

Large language models (LLMs) have shown amazing performance on tasks that require planning and reasoning. Motivated by this, we investigate the internal mechanisms that underpin a network's ability to perform complex logical reasoning. We first construct a synthetic propositional logic problem that serves as a concrete test-bed for network training and evaluation. Crucially, this problem demands nontrivial planning to solve. We perform our study on two fronts. First, we pursue an understanding of precisely how a three-layer transformer, trained from scratch and attains perfect test accuracy, solves this problem. We are able to identify certain "planning" and "reasoning" mechanisms in the network that necessitate cooperation between the attention blocks to implement the desired logic. Second, we study how pretrained LLMs, namely Mistral-7B and Gemma-2-9B, solve this problem. We characterize their reasoning circuits through causal intervention experiments, providing necessity and sufficiency evidence for the circuits. We find evidence suggesting that the two models' latent reasoning strategies are surprisingly similar, and human-like. Overall, our work systemically uncovers novel aspects of small and large transformers, and continues the study of how they plan and reason.

activation, attention head, attention weight, (16 more...)

arXiv.org Artificial Intelligence

2411.04105

Country:

North America > United States (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

Cao, Yue, Xu, Jiarui, Lin, Stephen, Wei, Fangyun, Hu, Han

arXiv.org Artificial IntelligenceApr-25-2019

The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found that the global contexts modeled by non-local network are almost the same for different query positions within an image. In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further observe that this simplified design shares similar structure with Squeeze-Excitation Network (SENet). Hence we unify them into a three-step general framework for global context modeling. Within the general framework, we design a better instantiation, called the global context (GC) block, which is lightweight and can effectively model the global context. The lightweight property allows us to apply it for multiple layers in a backbone network to construct a global context network (GCNet), which generally outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks. The code and configurations are released at https://github.com/xvjiarui/GCNet.

artificial intelligence, machine learning, query position, (17 more...)

arXiv.org Artificial Intelligence

1904.11492

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback